D3S: Debugging Deployed Distributed Systems
نویسندگان
چکیده
Testing large-scale distributed systems is a challenge, because some errors manifest themselves only after a distributed sequence of events that involves machine and network failures. DS is a checker that allows developers to specify predicates on distributed properties of a deployed system, and that checks these predicates while the system is running. When DS finds a problem it produces the sequence of state changes that led to the problem, allowing developers to quickly find the root cause. Developers write predicates in a simple and sequential programming style, while DS checks these predicates in a distributed and parallel manner to allow checking to be scalable to large systems and fault tolerant. By using binary instrumentation, DS works transparently with legacy systems and can change predicates to be checked at runtime. An evaluation with 5 deployed systems shows that DS can detect non-trivial correctness and performance bugs at runtime and with low performance overhead (less than 8%).
منابع مشابه
Live debugging of distributed systems
Debugging distributed systems is challenging. Although incremental debugging during development finds some bugs, developers are rarely able to fully test their systems under realistic operating conditions prior to deployment. While deploying a system exposes it to realistic conditions, debugging requires the developer to: (i) detect a bug, (ii) gather the system state necessary for diagnosis, a...
متن کاملComparison , Replay , and Refinement of Communication Traces for Debugging Distributed Failures
An increasing number of companies build their business on distributed Web applications. Hosting providers respond to that demand and made it easier to deploy systems that spread across multiple services. However, this trend has outpaced the development of adequate debugging tools and developers still have to rely on an improvised patchwork of symbolic debuggers and printf debugging to find fail...
متن کاملTowards Lightweight Logging and Replay of Embedded, Distributed Systems⋆ (Invited Paper)
Due to their safety critical nature, Cyber-Physical Systems such as collaborative cars or smart grids demand for thorough testing and evaluation. However, debugging such systems during deployment is challenging, due to the concurrent nature of distributed systems and the limited insight that any deployed system offers. In this paper we introduce MILD; providing Minimal Intrusive Logging and Det...
متن کاملTowards Lightweight Logging and Replay of Embedded, Distributed Systems
Due to their safety critical nature, Cyber-Physical Systems such as collaborative cars or smart grids demand for thorough testing and evaluation. However, debugging such systems during deployment is challenging, due to the concurrent nature of distributed systems and the limited insight that any deployed system offers. In this paper we introduce MILD; providing Minimal Intrusive Logging and Det...
متن کاملPeeking into Spammer Behavior from a Unique Vantage Point
cO N fe re N ce re p O rt s 105 from the logs. This can entail considerable developer effort, and getting just the right level of logging can require many iterations: Too much logging can produce unacceptable overhead, but too little will miss key state changes. And even after the logs are captured, analysis remains challenging. D3S attempts to simplify the process of runtime assertion checking...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008